Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
نویسنده
چکیده
Sentence-level aligning bilingual parallel corpus is shown significant and indispensable status in machine translation, translation knowledge acquiring and bilingual lexicography research fields, which is the fundamental work for natural language processing. Given the great deal of work in sentence alignment and a variety of methods have developed for bilingual terminology extraction, those are unpractical for newly underway Tibetan information processing because those methods have to use a large number of manufactured sentences as training corpus while extracting inter-translatable word pairs. This paper proposes a multi-strategy Tibetan-Chinese sentence alignment method based on length of sentence, syntactic rules and bilingual dictionary. We test our approach on a bilingual corpus crawled from bilingual website and perform manual evaluation on bilingual sentences pairs extracted from Tibetan-Chinese corpora.
منابع مشابه
Phrase Alignment Based on Combination of Multiple Strategies
Phrase translation pairs are very useful for bilingual lexicography, machine translation system, crosslingual information retrieval and many applications in natural language processing. There is phrase boundary information in parsing trees of sentences. Linguistics knowledge in translation lexicon and semantic lexicon, and statistics results from bilingual corpus can be used to align Chinese wo...
متن کاملSemi-supervised Chinese Word Segmentation based on Bilingual Information
This paper presents a bilingual semisupervised Chinese word segmentation (CWS) method that leverages the natural segmenting information of English sentences. The proposed method involves learning three levels of features, namely, character-level, phrase-level and sentence-level, provided by multiple submodels. We use a sub-model of conditional random fields (CRF) to learn monolingual grammars, ...
متن کاملMultiple Linear Regression for Extracting Phrase Translation Pairs
Phrase translation pairs are very useful for bilingual lexicography, machine translation system, crosslingual information retrieval and many applications in natural language processing. Phrase translation pairs are always extracted from bilingual sentence pairs. In this paper, we extract phrase translation pairs based on word alignment results of Chinese-English bilingual sentence pairs and par...
متن کاملWord Alignment of English-Chinese Bilingual Corpus Based on Chucks
In this paper, a method for the word alignment of English-Chinese corpus based on chunks is proposed. The chunks of English sentences are identified firstly. Then the chunk boundaries of Chinese sentences are predicted by the translations of English chunks and heuristic information. The ambiguities of Chinese chunk boundaries are resolved by the coterminous words in English chunks. With the chu...
متن کاملEvaluating the Quality of Web-Mined Bilingual Sentence Pairs
We come up with the problem of evaluating the quality of bilingual sentence pairs mined from the web, which is critical for a wide range of applications such as statistical machine translation (SMT) and English as Second Language (ESL) learning. To address this problem, we propose a novel method that integrates multiple linguistic features related to spelling, grammar, alignment, and particular...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015